Goto

Collaborating Authors

 max null 0







A Closed form expressions for the robust risks

Neural Information Processing Systems

In Section A.1 and A.2 we derive closed-form expressions of the standard and robust risks from We first prove Equation (13). We now prove the second part of the statement. In this section we provide additional details on our experiments. B.1 Neural networks on sanitized binary MNIST If not mentioned otherwise, we use noiseless i.i.d. C.1 we give an intuitive explantion for the robust overfitting phenomenon described in C.2 we discuss how inconsistent adversarial training prevents We now shed light on the phenomena revealed by Theorem 3.1 and Figure 2. In particular, we In this section we further discuss robust logistic regression studied in Section 4. As observed in Section 4.4, label noise can prevent interpolation and hence improve the robust risk Hence, inconsistent training perturbations can induce spurious regularization effects.


A Instantaneous Regret Bound Conditioned on the event that (8) in Lemma 1 holds (with probability 1 δ), it follows that c

Neural Information Processing Systems

From (18), (19), and (20), we obtain (9), (10), and (11), respectively. We prove the following Lemma 4 which is then used to prove Lemma 5. Lemma 3 follows from Lemma 5. Lemma 4. Let W Thus, Lemma 5 implies Lemma 3. Let us consider V -TS that selects a single query at each BO iteration (Algorithm 3). The simulation returns the location of a pushed object given the robot's location and the pushing duration, i.e., There are 30 initial observations, i.e., | D


A Closed form expressions for the robust risks

Neural Information Processing Systems

In Section A.1 and A.2 we derive closed-form expressions of the standard and robust risks from We first prove Equation (13). We now prove the second part of the statement. In this section we provide additional details on our experiments. B.1 Neural networks on sanitized binary MNIST If not mentioned otherwise, we use noiseless i.i.d. C.1 we give an intuitive explantion for the robust overfitting phenomenon described in C.2 we discuss how inconsistent adversarial training prevents We now shed light on the phenomena revealed by Theorem 3.1 and Figure 2. In particular, we In this section we further discuss robust logistic regression studied in Section 4. As observed in Section 4.4, label noise can prevent interpolation and hence improve the robust risk Hence, inconsistent training perturbations can induce spurious regularization effects.


Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds

Metelli, Alberto Maria, Drago, Simone, Mussi, Marco

arXiv.org Machine Learning

We study the regret minimization problem in the novel setting of generalized kernelized bandits (GKBs), where we optimize an unknown function $f^*$ belonging to a reproducing kernel Hilbert space (RKHS) having access to samples generated by an exponential family (EF) noise model whose mean is a non-linear function $μ(f^*)$. This model extends both kernelized bandits (KBs) and generalized linear bandits (GLBs). We propose an optimistic algorithm, GKB-UCB, and we explain why existing self-normalized concentration inequalities do not allow to provide tight regret guarantees. For this reason, we devise a novel self-normalized Bernstein-like dimension-free inequality resorting to Freedman's inequality and a stitching argument, which represents a contribution of independent interest. Based on it, we conduct a regret analysis of GKB-UCB, deriving a regret bound of order $\widetilde{O}( γ_T \sqrt{T/κ_*})$, being $T$ the learning horizon, $γ_T$ the maximal information gain, and $κ_*$ a term characterizing the magnitude the reward nonlinearity. Our result matches, up to multiplicative constants and logarithmic terms, the state-of-the-art bounds for both KBs and GLBs and provides a unified view of both settings.


Semi-gradient DICE for Offline Constrained Reinforcement Learning

Kim, Woosung, Seo, JunHo, Lee, Jongmin, Lee, Byung-Jun

arXiv.org Artificial Intelligence

Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline settings. However, we have observed that recent approaches designed to enhance the offline RL performance of the DICE framework inadvertently undermine its ability to perform OPE, making them unsuitable for constrained RL scenarios. In this paper, we identify the root cause of this limitation: their reliance on a semi-gradient optimization, which solves a fundamentally different optimization problem and results in failures in cost estimation. Building on these insights, we propose a novel method to enable OPE and constrained RL through semi-gradient DICE. Our method ensures accurate cost estimation and achieves state-of-the-art performance on the offline constrained RL benchmark, DSRL.